Cell Systems
○ Elsevier BV
Preprints posted in the last 90 days, ranked by how well they match Cell Systems's content profile, based on 167 papers previously published here. The average preprint has a 0.56% match score for this journal, so anything above that is already an above-average fit.
Shoyer, T. C.; Di Ventura, B.
Show abstract
Transcription factors (TFs) respond to external stimuli with time-varying changes in activity or localization (TF dynamics), driving differential transcriptional programs. Previous studies indicated that TF dynamics can be decoded at the promoter level in eukaryotes, yet a systematic understanding of robust solutions is lacking. By computationally screening over 10,000 mathematical models of multi-state promoters with various forms of TF-mediated regulation, we identify robust configurations that selectively respond to sustained ("pulse filtering") or pulsatile ("pulse boosting") TF dynamics. Promoters that activate via intermediate states and have negatively regulated deactivation robustly perform pulse filtering. In contrast, robust pulse boosting is achieved by promoters with a TF-mediated refractory state that permits short activation and recovers between pulses. Bifunctional TFs that exert activator- and repressor-like regulation extend the design space for pulse boosting. These results reveal general principles by which promoters interpret TF dynamics and suggest strategies to engineer synthetic systems to exploit them. HighlightsO_LIComputational screen of over 10,000 promoter models identifies features that enable promoters to selectively respond to sustained ("pulse filtering") or pulsatile ("pulse boosting") transcription factor (TF) dynamics. C_LIO_LIPromoters that activate via intermediate states and have negatively regulated deactivation robustly perform pulse filtering. C_LIO_LIPromoters with TF-regulated refractoriness robustly perform pulse boosting. C_LIO_LIPromoters regulated by bifunctional TFs extend the design space for pulse boosting. C_LI
Lopez-Malo, M.; Maerkl, S. J.
Show abstract
Transcription factors (TFs) regulate gene expression by binding cis-regulatory DNA elements, yet how trans-regulatory characteristics such as TF affinity, concentration, and localization interact with cis-regulatory elements remains largely unclear. We systematically analyzed TF affinity mutants across abundance, and localization states and found that promoter binding-site strength most readily modulated expression levels, followed by TF localization and concentration, while affinity variations were mainly buffered. We further uncover performance trade-offs between TF abundance, localization, and affinity. Together, these results reveal how trans and cis factors collectively shape gene-regulatory output.
Visani, G. M.; Verma, A.; DeWitt, W. S.
Show abstract
Recent work from Tran et al. (Science, 2026) introduced MULTI-evolve, a framework for protein engineering that combines single-mutant nomination via a protein language model (PLM) or a deep mutational scan (DMS), experimental single- and double-mutant characterization, and neural networks to engineer hyperactive multimutant proteins. The authors attribute the frameworks performance to "epistasis-aware modeling" and claim that their neural networks "learn the epistatic landscape" and "identify synergistic interactions" from limited double-mutant training data. Additive models, by definition, cannot represent epistasis, making them a natural null baseline for such claims. Here we show that MULTI-evolves multimutant predictions are almost perfectly correlated with an additive models across all three engineering applications (APEX, dCasRx, and HuABC2), such that the engineering of multimutants reduces to combining beneficial mutations with the largest additive effects--a standard protein engineering strategy for over four decades. We also find that MULTI-evolves neural networks do not outperform an additive model on held-out test set predictions, and do not even represent epistasis in their training data. Finally, we revisit a DMS benchmark finding presented as evidence of epistasis learning and show that the same pattern is expected even under a null additive model, due to an elementary statistical phenomenon; when we fit an additive model to the benchmark data, it reproduces the reported pattern. More broadly, our findings underscore the need to benchmark models for machine learning-guided directed evolution against additive null baselines before attributing performance to learned epistasis.
Cheng, X.; Li, P.; Guo, H.; Liang, Y.; Gong, J.; de Vazelhes, W.; Gou, C.; Xie, P.; Song, L.; Xing, E. P.
Show abstract
A virtual cell is a world model of a cell: a computational system that predicts, simulates and programs cellular processes across modalities and scales. An important path toward this goal is to model how genetic and chemical perturbations give rise to transcriptional responses, a core capability for disease understanding and drug discovery. However, current approaches remain expert-intensive, relying on iterative manual model design, training and debugging over months. Here we present VCHarness, an autonomous AI system that constructs perturbation-response models by combining an AI coding agent with multimodal biological foundation models. The system explores large spaces of architectures and training pipelines with minimal human intervention, iteratively generating, evaluating and refining candidate models. Across multiple perturbation-response benchmarks, VCHarness identifies architectures that outperform expert-designed approaches while reducing development time from months to days. It further uncovers non-obvious architectural patterns associated with improved performance, indicating that automated search can extend beyond conventional design strategies. These results suggest a shift from manually engineered models toward autonomous systems for constructing components of virtual cell world models, enabling scalable and data-driven exploration of cellular systems.
Badkul, A.; Mottaqi, M.; Xie, L.; Xie, L.
Show abstract
Protein post-translational modifications (PTMs), particularly phosphorylation, serve as the primary "molecular switches" that orchestrate cellular signaling and drug response. While PTM dysregulation is a hallmark of cancer and neurodegeneration, the lack of standardized, drug-perturbed datasets has hindered the development of predictive models capable of capturing context-dependent PTM responses. Effective predictive modeling must therefore integrate multidimensional data, including the specific drug, dosage, treatment duration, cellular background, and the modified site. However, existing PTM resources remain largely static and fail to capture drug-induced regulation across these critical dimensions. To address this gap, we present DrugPTM-Bench, a curated, large-scale benchmark derived from decryptM-derived dose-dependent PTM measurements, standardizing site-level drug response across 7 cancer cell lines, 27 drugs, and 11,167 proteins. Comprising 99.5% phosphorylation events, the dataset includes six time points, 16 dosage levels, and pEC50 potency values (half-maximal effective concentration). We formulate a classification task to identify upregulated, downregulated, or unchanged PTM sites (following a drug treatment), a critical step in deciphering drug Mechanism of Action (MoA) and target engagement. Our evaluation reveals that in protein-disjoint out-of-distribution (OOD) setting, baseline machine learning and deep learning models struggle to recover minority regulation classes, while standard rebalancing strategies improve recall only at the cost of precision and overall F1-score. These results indicate that current methods do not learn robust decision boundaries between regulated and unchanged PTM events. DrugPTM-Bench provides a phosphoproteomics benchmark for modeling drug-induced PTM regulation in imbalanced biological settings. Beyond classification, DrugPTM-Benchs retention of pEC50 values, drug perturbation profiles, and site-level sequence context enables additional predictive tasks including drug potency regression, mechanism-of-action prediction from PTM fingerprints, and drug-specific PTM site sensitivity ranking, establishing a multi-task benchmark for PTM-centric drug discovery. Ultimately, DrugPTM-Bench establishes a rigorous framework for developing robust, context-aware models to elucidate drug MoA and signaling dynamics.
Xu, T.; Hu, Z.; Sun, X.; Xiong, M.
Show abstract
Omics-based disease-gene discovery is typically performed as if molecular states evolve independently of tissue mechanics. Most current pipelines analyze transcriptomic or multimodal molecular data alone and identify abnormal genes using differential expressions, latent trajectories, or association-based recovery under treatment. However, in mechanically active tissues, gene expression is shaped not only by internal regulatory networks but also by mechanotransduction arising from strain, curvature, force transmission, and scaffold geometry. This raises a fundamental question: should disease-gene identification in tissues be treated as a pure omics association problem, or as a causal mechanochemical inference problem? We introduce a mechanotransduction-aware causal omics framework on a Cosserat tissue scaffold. Gene expression evolves through intrinsic regulatory dynamics, spatial diffusion, external control, and a mechanotransduction term driven by scaffold mechanics. To distinguish causation from association, we define a hidden mechano-drug rescue channel in the true data-generating system and compare predictive models that either include or omit mechanotransduction. We show that association-based rankings can incorrectly elevate downstream homeostatic or repair genes, even when the disease gene is the true direct mechanochemical target. By contrast, a causal ranking based on reconstruction of the direct mechanotransduction intervention effect correctly identifies the disease gene as the strongest beneficiary. These results argue that popular pure-omics analysis is insufficient for disease-gene discovery in mechanically structured tissues. Mechanotransduction should be modeled as part of the causal structure of tissue biology rather than treated as a secondary covariate or omitted entirely.
Duncan, A. G.; Consens, M. E.; Crawford, L.; Mitchell, J. A.; Moses, A. M.; Yang, K. K.; Lu, A. X.
Show abstract
Deep learning has been instrumental in our understanding of how enhancers encode regulatory information in their DNA sequence and has demonstrated preliminary success with enhancer design. However, the prevailing approach for enhancer design, cell type label conditioning, depends on labeled data from massively parallel reporter assays, which only exists for a handful of cell types. We propose EnhancAR, an autoregressive model trained on sets of unaligned homologous enhancer sequences to learn the function of the enhancer conserved over evolution and generate sequences that resemble real homologs. By training EnhancAR on 1.7 million human enhancer homolog sets spanning 1,888 cell types, EnhancAR generates enhancers for a variety of contexts without being conditioned on a cell type label. We computationally validate that when conditioned on a set of enhancer homologs, EnhancAR generates novel and diverse sequences that preserve the functional properties of the homologs. By prompting EnhancAR with homologs for existing cell type specific enhancers, we design enhancers with similar predicted cell type specificity. We further demonstrate that when trained on length sorted homologs, EnhancAR can design enhancers shorter than the conditioning homologs that preserve the predicted activity. In summary, we find that leveraging evolutionary information in enhancer homologs enables a more flexible and general paradigm for designing enhancers with specific functions.
Hendrychova, V.; Brinda, K.
Show abstract
One important question in bacterial genomics is how to represent and search modern million-genome collections at scale. Phylogenetic compression effectively addresses this by guiding compression and search via evolutionary history, and many related methods similarly rely on tree- and ordering-based heuristics that leverage the same underlying phylogenetic signal. Yet, the mathematical principles underlying phylogenetic compression remain little understood. Here, we introduce the first formal framework to model phylogenetic compression mechanisms. We study genome collections represented as RLE-compressed SNP, k-mer, unitig, and uniq-row matrices and formulate compression as an optimization problem over genome orderings. We prove that while the problem is NP-hard for arbitrary data, for genomes following the Infinite Sites Model it becomes optimally solvable in polynomial time via Neighbor Joining (NJ). Finally, we experimentally validate the models predictions with real bacterial datasets using an exact Traveling Salesperson Problem (TSP). We demonstrate that, despite numerous simplifying assumptions, NJ orderings achieve near-optimal compression across dataset types, representations, and k-mer ranges. Altogether, these results explain the mathematical principles underlying the efficacy of phylogenetic compression and, more generally, the success of tree-based compression and indexing heuristics across bacterial genomics.
Barreto, Y. B.; Jongman, E. P. H.; Patino-Ruiz, M. F.; Grundel, D. A. J.; Uysal, M.; Coenradij, J.; Poolman, B.; Heinemann, M.
Show abstract
When exposed to a nutrient, cells activate metabolism by reorganizing metabolite pools and enzyme expression to approach the maximal growth rate permitted by physicochemical constraints. While these constraints define reachable steady states, here we propose that the Gibbs energy accessible at activation further limits which states are reached. Using minimal metabolic models, we find that limited accessible Gibbs energy can trap cells in low-growth states by constraining metabolic reorganization and imposing a proteomic burden on transport and phosphorylation reactions. To investigate this experimentally, we reconstituted the arginine deiminase pathway in vesicles, revealing that the size of a conserved pool of interconverting metabolites (arginine, citrulline, and ornithine) determines accessible Gibbs energy and constrains steady-state ATP production rate, a proxy for growth. Together, these results indicate that cellular metabolism retains memory of its initial energetic state, with accessible Gibbs energy at activation acting as a thermodynamic constraint on long-term growth.
Thiel, M.; Cunningham, A.; Barnes, C. P.
Show abstract
Reinforcement learning has driven the mass adoption of large language models by unlocking unexpected capabilities, yet this approach remains largely underexplored for generative DNA models. We investigate whether similar post-training techniques can induce emergent biological realism in DNA language models, using plasmid generation as a testbed due to plasmids relative simplicity, well-characterized functional constraints, and ubiquity in biotechnology. Using Group Relative Policy Optimization with a reward function based on constraints from engineered biology, our model achieves a 77% quality control pass rate compared to 5% for the pretrained baseline. Remarkably, beyond explicitly optimized features, the model exhibits surprising biological parallels: generated sequences match natural plasmids in thermodynamic stability, codon usage patterns, and ORF length distributions, properties not explicitly optimized in the reward function. These results suggest that RL post-training can steer DNA language models toward biologically coherent regions of sequence space, analogous to how such techniques unlock unexpected capabilities in natural language models, particularly in verifiable domains.
Simensen, V.; Almaas, E.
Show abstract
Metal-binding proteins account for nearly half of the characterized proteome, and they rely on metal-binding sites (MBSs) as critical determinants of their structural stability and biological function. However, methods for comparing their local binding environments lag behind those for whole-structure alignment. Here, we represent MBSs as atomic point clouds surrounding bound metal ligands and align them with a fine-tuned iterative closest point algorithm. Applying this framework to a redundancy-reduced collection of MBSs derived from all metalloproteins in the Protein Data Bank (PDB), we perform pairwise alignments across 23,342 sites to construct a similarity network of metal-binding environments. The resulting network topology recapitulates metal coordination chemistry and enzyme function: links are strongly enriched within metal types and across shared EC subclasses. Conserved metalloenzyme families form cohesive subnetworks; for example, the binuclear ureohydrolase domain appears as two tightly connected components that also capture atypical members such as the dinickel metformin hydrolase. We observe only a moderate global association between protein sequence and MBS geometry, yet many network links connect near-identical binding-site architectures across proteins with low sequence identity, consistent with either divergent evolution with local MBS conservation or candidate cases of molecular convergent evolution. Integrating network proximity with structural evidence of drug binding identifies drugs with enriched connectivity among their targets and predicts 528 drug-off-target combinations across 88 drugs and 151 human proteins, recovering both known off-targets (e.g., ADAM/ADAMTS for matrix metalloproteinase inhibitors) and proposing novel ones. The MBS network thus provides a scalable resource for probing metalloprotein evolution, functional convergence, and the structural basis of drug cross-reactivity. Author summaryWe study how metals shape protein structure and function by comparing metal-binding sites (MBSs) rather than whole proteins. We represent each MBS site as a point cloud of atoms surrounding the bound metal and align 23,342 sites from the Protein Data Bank (PDB) with a fine-tuned iterative closest point algorithm. This yields a similarity network whose links mirror metal coordination chemistry and enzymatic roles: sites binding the same metal or sharing enzyme classes cluster together, and conserved metalloenzyme families (e.g., binuclear ureohydrolases) form tight subnetworks that also capture atypical members such as a dinickel metformin hydrolase. Because highly similar MBS geometries often link proteins with low sequence identity, the MBS network highlights candidates consistent with either divergent evolution with locally conserved MBS architecture or convergent evolution toward similar coordination geometries in otherwise unrelated protein contexts. Overlaying known drug-binding sites lets us flag drugs whose targets are tightly connected and propose plausible off-targets, recovering known matrix metalloproteinase off-targets and suggesting new ones. Our approach offers a scalable map of metalloprotein relationships useful for studying evolution and anticipating drug cross-reactivity.
Farinas, M.; Bermudez, V.; Tsirvouli, E.; Zobolas, J.; Aittokallio, T.; Lehti, K.; Flobak, A.; Lippestad, K.
Show abstract
Effective drug combination therapies can improve cancer treatment, yet the mechanistic basis of drug synergy remains poorly understood. Most computational approaches prioritize predictive accuracy over molecular mechanistic interpretability, providing hence limited insights into how synergistic effects emerge across signalling contexts. We developed Trafikk, a molecular-signalling network-based framework that simulates drug perturbations in cell line-specific computational models to mirror functional outcomes in experimental combination screens. Across two independent large-scale datasets, Trafikk identified synergistic combinations with >77% recall. Functional response predictions revealed both conserved and context-dependent mechanisms. While AKT-MEK co-inhibition consistently disrupted coordinated survival and apoptotic signalling in 742 cell lines, PI3K-BCL2 synergy arose through distinct death programs shaped by cell-context-specific network constraints. Trafikk combines predictive performance with mechanistic interpretability, capturing how and why drug synergy emerges across cellular contexts. Source code, installation instructions and usage tutorial are freely available at https://github.com/druglogics/trafikk. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=147 SRC="FIGDIR/small/723755v1_ufig1.gif" ALT="Figure 1"> View larger version (33K): org.highwire.dtl.DTLVardef@159ca61org.highwire.dtl.DTLVardef@1f5ccecorg.highwire.dtl.DTLVardef@60d56eorg.highwire.dtl.DTLVardef@15c3021_HPS_FORMAT_FIGEXP M_FIG C_FIG
Maxian, O.; Munro, E.; Dinner, A.
Show abstract
A key question in cell biology is how cell-scale organization emerges from a given set of molecular players and rules of interaction. Given its multiscale nature, addressing this question requires a combination of experimental perturbation, mathematical modeling, and parameter inference. We leverage recent advances in each of these fields, focusing in particular on neural-network methods for simulation-based inference, to study how cell-scale patterns of Rho GTPase activity are defined by molecular-scale activator-inhibitor interactions with filamentous actin. We show that variations in F-actin assembly dynamics can be inferred directly from experimental data by combining a mathematical model with a neural network trained to associate parameter sets with data. Our neural approach differentiates data sets more precisely than traditional summary statistics, and yields a complete and robust likelihood function for each data set. Utilizing the trained network, we demonstrate how RhoGAP tunes RhoA waves via interaction with F-actin. After showing that the known functions of RhoGAP are insufficient to explain experimentally-observed dynamics, we use neural methods to infer that RhoGAP must, at a minimum, also decrease filament nucleation rates to sustain waves. Our work yields specific, experimentally-testable predictions and illustrates how a combination of traditional forward models and modern inference tools can aid in unraveling mechanisms of self-organization.
Tien, H.; Meda, R. S.; Shastry, S.; Mysore, V.
Show abstract
Generalizable protein-expression prediction can accelerate protein engineering, inform disease mechanisms, and help optimize heterologous recombinant protein production. Protein expression is governed by many interacting parameters that no single omics view captures. We develop Aiki-XP, a multimodal platform integrating four biological scales (genome, operon, coding sequence, protein) plus biophysical features across 492,026 genes from 385 bacterial species. Aiki-XP predicts within-species relative abundance (per-species z-score rank), not absolute copies per cell. Under a leakage-controlled gene-operon split Aiki-XP reaches Spearman{rho} nc = 0.592 versus 0.509 for ESM-C 600M alone, and each tier of a monotone protein[->]operon[->]genome deployment ladder yields a statistically significant gain; a five-recipe rank-average ensemble adds a further +0.016. All recipes were locked before external evaluation; transfer to heterologous, cross-species, and novel-phylum benchmarks demonstrates utility and limits. Ablations and scaling experiments identify operon-scale genomic context, not protein-language-model capacity, as the rate-limiting input at this scale; one foundation model per biological scale suffices, with same-scale stacking adding little.
Carriere, L.; Huyghe, A.; Pajkos, M.; Bernado, P.; Cortes, J.
Show abstract
Intrinsically disordered proteins and regions (IDRs) are central to a multitude of biological processes. Despite extensive studies of their structural and physicochemical properties, the rational design of IDRs with defined conformational behavior remains challenging due to their ensemble nature. Here we present a generative framework for designing disordered protein sequences conditioned on target conformational ensemble descriptors using protein language models (pLMs). We formulate IDR design as the task of generating amino acid sequences predicted to realize specified biophysical properties and implement a Transformer encoder-decoder architecture that maps numerical descriptors to protein sequences. By training models on datasets spanning two orders of magnitude in size, we show that accurate control of conformational and physicochemical properties is achieved only at large data scale. These results demonstrate the feasibility of conditioning generative models on ensemble-level descriptors for IDR design. More broadly, these results support a data-centric paradigm for protein engineering, in which data availability emerges as a key limiting factor for the accurate design of IDRs.
Bixby, E.; Brunner, G.; Danciu, D.; Dela Rosa, R.; Deutschmann, N.; Ferragu, C.; Geiger, F.; Holberg, C.; Kidger, P.; Lindoulsi, A.; Lutz, N.; McColgan, T.; Milius, S.; Shah, J.; Vandeloo, M.; Vidas, P.; Ziegler, J. D.; van Rossum, H.; van der Vorm, D.; Baldi, N.; IJSpeert, C.; Monza, E.; Schriek, A.
Show abstract
Lead optimization remains the longest and most expensive step in pre-clinical drug discovery, typically consuming 12-36 months whilst costing $5M-$15M per candidate. We introduce O_SCPLOWCRADLEC_SCPLOWO_SCPCAP-1C_SCPCAP, an automated framework for protein engineering. While O_SCPLOWCRADLEC_SCPLOWO_SCPCAP-1C_SCPCAP supports the full process of drug discovery and industrial protein engineering pipelines, including hit identification and de novo binder design, this work focuses on its application to multi-property lead optimization across protein modalities (VHHs, scFvs, IgGs, peptides, enzymes, CRISPR systems, vaccines). We show it is 4-7x faster than rational design, as measured by the number of wet lab rounds required. We provide in-vitro validation across all of the above modalities, typically optimizing multiple properties simultaneously (single and polyspecific binding down to picomolar, activity, thermostability,...). Technically, O_SCPLOWCRADLEC_SCPLOWO_SCPCAP-1C_SCPCAP starts with pre-trained foundation protein language models (PLMs), which are fine-tuned in unsupervised fashion on evolutionary neighborhoods, in supervised fashion using lab-in-the-loop data, and then deployed in a multi-model workflow. Of additional interest, we find that (a) the end-to-end system may be run in automated fashion; (b) wet lab data may be consumed in black box fashion without knowledge of the underlying biochemical mechanisms; (c) structural data may largely be superseded by sequence-function pairs.
Anzum, H.; Kochat, V.; Satpati, S.; Mahmud, M. I.; Dwarampudi, J. M. R.; Rai, K.; Shukla, P.; Javle, M.; Kwong, L.; Banerjee, T.
Show abstract
Understanding how neighboring cells influence cellular states is central to spatial transcriptomics, yet most existing methods rely on correlation or predefined ligand-receptor (LR) pairs and do not explicitly test directionality. We introduce a counterfactual, intervention-based framework for inferring directional cell-cell influence that is LR-agnostic and tests sender specificity. A neighborhood-conditioned graph model predicts receiver cell state from local spatial context. Directional influence is quantified by counterfactually replacing neighbors of a candidate sender type and measuring the resulting displacement in predicted receiver state. We define a Counterfactual Directionality Score (CDS) that quantifies directional influence, and compute pair-level CDS by aggregating across receiver cells and test cores for each ordered sender-receiver pair. Applied to Xenium cholangiocarcinoma tissue microarrays (38 cores), the framework identified reproducible, asymmetric interactions between tumor, immune, and stromal compartments, most prominently Tumor-EMT[->] Macrophage (CDS = 0.0828) and Fibroblast[->]Macrophage (CDS = 0.0582). Effects exceeded label-permutation and spatial-shuffle null models (p < 0.001, FDR-controlled) and remained stable under core-level bootstrap resampling. Inferred directional strengths correlated strongly with matched LR scores (r = 0.758, p = 0.0027), supporting biological concordance. These results demonstrate counterfactual testing as a statistically rigorous and scalable approach for directional cell-cell communication analysis in spatial transcriptomics.
Tat, J.; Lay, F. D.; Stevens, J.; Lewis, N. E.
Show abstract
Chinese hamster ovary (CHO) cells are the dominant host for therapeutic protein production, yet intra- and inter-clonal heterogeneity in manufacturing phenotypes, and the underlying metabolic and secretory circuitry, remain poorly defined at single-cell resolution. Here, we apply secretion encoded single-cell sequencing (SEC-seq) to simultaneously measure transcriptomes and secreted IgG in single-cells from a parental production cell line and five CHO clones, each varying in cell-specific productivity. IgG mRNA and recombinant protein secretion are only moderately correlated across single cells, indicating that transcription alone does not explain intra-clonal secretion heterogeneity. By integrating SEC-seq with single-cell metabolic and secretory task scoring, we find that CHO cells accommodating recombinant protein expression burden have more active translation-associated pathways and suppressed energy-intensive endogenous secreted protein processing. Three high-secreting clones converge on this translation-focused state but differ in their subpopulation composition and energy/redox programs coupled to IgG output: one highly productive clone shows a low-growth, glycolytic, NAD/one-carbon-associated and UPR-activated program; a second shows increased oxidative phosphorylation and fatty-acid {beta}-oxidation, and a third shows higher lipid-uptake with modest central carbon metabolism. Genes such as Aldoa, Ndufab1, Acsl5, Mthfd2 showed clone-specific correlations with IgG, linking glycolysis, mitochondrial respiration, fatty-acid metabolism, and redox to secretion. Together, these results demonstrate that SEC-seq can resolve IgG-coupled metabolic-secretory wiring within and between CHO clones, providing a framework to identify subpopulation and circuit features to engineer or select for improved recombinant protein production.
Zhu, A.; Ho, P.-Y.
Show abstract
Bacterial growth and the underlying metabolic networks are highly dissimilar across species, posing a fundamental challenge for bioengineering tasks involving diverse species. For a given species across nutrient environments, growth is regulated via proteome allocation, which gives rise to linear relationships between growth and the sizes of coarse-grained proteome sectors. However, whether and how coarse-grained growth predictors generalize across species remain unclear. Here, using genome-scale metabolic models, we discover a simple cross-species trend in which the monoculture growth of a species is proportional to the number of nutrients it utilizes, indicating that the latter is a regulatory feature that is conserved across species. By coarse-graining metabolic networks using feature learning, we identify novel proteome sectors whose sizes exhibit cross-species correlations with growth in wide-ranging experiments, suggesting that these sectors are also conserved regulatory features. We further show that the sectors enable a predictive encoding of proteome costs and growth benefits, thereby providing a potential explanation for how coarse-grained network features emerge to be simple determinants of growth across diverse metabolic networks.
Camacho-Mateu, J.; Burgio, G.; Quiros-Rodriguez, I.; D Fernandez-de-Bobadilla, M.; Sanchez, A.
Show abstract
The function of microbial communities is often dominated by additive and pairwise interactions, raising the question of whether this reflects intrinsic biological simplicity or fundamental limits of detection. Here, we leverage the theory of fitness landscapes to bridge microbial ecology and genetics, and show that this apparent simplicity is a generic consequence of structural and statistical constraints rather than evidence for intrinsically weak higher-order interactions (HOIs). We separate the detectability of individual epistatic interactions from their contribution to functional variance, and demonstrate that local k-order interactions suffer from exponential noise amplification while their contributions to total variance are intrinsically suppressed by combinatorial geometric dilution. Applying this framework to a fully sampled 210 experimental microbial landscape, we find that only first- and second-order interactions are distinguishable from experimental noise. Furthermore, generalized Lotka-Volterra simulations reveal that experimental noise alone can generate the illusion of higher-order structure in communities where all direct mechanistic interactions are pairwise and indirect interactions are weak. Our findings identify universal, order-dependent limits on the quantification of epistasis that apply to high-dimensional landscapes across ecology and genetics, providing a principled foundation for rational community design.